At work we have this daemon that I wrote in Perl. This thing was written to create a new database handle every time it needed to do something. It uses DBI with DBD::Oracle for the database connectivity. A while back (a few months ago at least, maybe more) we noticed that it was using quite a bit of resident memory (varies depending on how long it's been running and the number of messages it has processed). It became an increasingly important issue as we grew nearer to the end of our internal testing cycle.
Yesterday we were told to find out how bad it really is and what we can do to fix it. We tracked it for a few hours and with a sustained rate of 2 messages/s, we determined the memory leak to be in the range of ~5Kb/s!
I was told to keep that instance running and to start a modified copy on another server that would use one persistent database handle for everything. After it was running for a few hours with a sustained rate of 2 messages/s, we determined that there was still a memory leak but it was lower, only 1.8Kb/s.
Some searching revealed that DBI 1.52 has been out for a while (we're using DBI 1.50). This version fixes two memory leaks; one involving handles (an unspecified amount of memory) and the second involving statement handles (16 bytes of memory). We were told to install Oracle 10.2 (we use 10.1), DBI 1.52 and the latest version of DBD::Oracle (to be safe). Well, that didn't go so well. The 10.2 installation process left the oracle directories with odd permissions so we weren't able to use the library files (the .so's). The DBA was told to reinstall 10.1.
This morning, I installed DBI 1.52 on the system running the modified daemon. I restarted the daemon and let it run. A while later we checked on it and the memory usage was the same as yesterday! I was crushed. I knew I would have to rewrite this thing from scratch if we couldn't figure out the source of the leaks.
I started commenting out huge blocks of code on the various servers. One I disabled the message acknowledgments. On another I disabled almost all of database transactions. On yet another I disabled everything but the networking components. Then I waited. After a while I checked the memory usage of them in order. The first was the same as before commenting out the code. I was a bit discouraged but moved on the second. It was a huge difference in memory usage! How could this be though? Why could the database stuff make this big of a difference if installing DBI 1.52 didn't? I went back to the server using the single db handle and checked the version of DBI being used. 1.50. 1.50? I checked my lib path and saw that someone had changed it on me overnight. I quickly changed it back and restarted the daemon. I was hopeful; almost relieved. One more shot at this working, so I waited.
About 30 minutes later, I checked the memory usage again. It had gone up a little and stablized. I was thrilled! I went around telling everyone the good news. We were told to check it again on Monday but I am certain that I won't have to rewrite the daemon.
DBI 1.52 has saved me from certain doom. Thank you Nicholas Clark, Ephraim Dan and Doru Theodor Petrescu for those patches!